Well the exercise tells us that the columns are the samples, so there are 2 dimensions.
There are 5 samples.
The
We compute the covariance matrix in the following way:
There are a few methods for matrix multiplication:
You remember that the columns of the matrix on the left are actually basis vectors.
So you need to substitute them to the basis vectors of the two vectors in the matrix on the right.
You compute the dot product between the rows of the first matrix and the columns of the second matrix.
It might be in the exam.
The variance stays the same but the covariance changes sign:
[[1 -0.8]
[-0.8 1]]
[[1 0.8]
[0.8 1]]
The absolute value of the determinant doesn't change at all because the space doesn't get stretched or shrinked, just rotated.
Whe need to apply a stretching transformation after step 4.
In order to do that, we need to change the basis vectors with something 5% bigger.
The classifier won't work because we are projecting the data in a dimension where the classifier can't tell the difference between the two classes.
These will be the 2 principal components.
If we project all data on the first PC, we get something like this:
You can't distinguish classes in this scatterplot.
Killing the first principal component and projecting everything on the second.
It would be best to go back and study the specifics of the Expectation Maximization algorithm.
Since he doesn't actually want us to compute the results(I THINK), we are gonna write the equations(not the gaussian one).
The following formula gives the density of the GMM at a point
The determinant
In the Gaussian pdf, we have:
The one in the denominator is there to counter-act the change in volume that the
In Masi's words:
The location parameter
The exercise is telling us that at
So the responsibilities are just the mixing coefficients.
In this case, after PCA we are left with 10k images in the form of vector of 50 pixels that can have value in between [0, 255].
But Masi also gives us the mean of the original images and the U matrix used to compress them.
This is useful because we can reverse the process and get a compressed version of the original images.
If we want to generate those samples using a GMM, repeat the following process 10K times:
This is because the gaussians don't have all the same size, so some are more probable to be picked or more present in the GMM.
To solve this, we do a weighted random pick of the gaussian.
Go back and study Gini impurity and Missclassification!
The entropy of the tree is given by the entropy of the last leaves weighed by the number of elements contained in them:
Masi gives us the sample [-1, -1000].
We know that:
Since there is equal probability that the class is square and triangle, it's best to say that the algorithm is unsure here.
We cannot use the following strategies because:
So we use Bagging and Ensamble, which is a technique that consists in splitting the dataset and training a decision tree on each one of the splits.
Then to make inference we average the outputs of the random forest and we have got our model with reduced variance.
We apply inverse transform sampling on the probabilities of the elements in the leaves????????????????????
The true positive rate, also referred to sensitivity or recall, is used to measure the percentage of actual positives which are correctly identified.
FPR is the opposite.
ROC curve is a performance measurement for the classification problems at various threshold settings.
It plots TPR and FPR at various threshold values.
Layer 1) We have 2048 weights for each neuron of the first layer, so:
BUT THIS LAYER IS NOT TRAINABLE, SO IT DOESNT COUNT.
Layer 2) We have 1024 weights for each neuron of the second layer, so:
Layer 3) We have 512 weights for each neuron of the second layer, so:
When we sum them all toghether, we get the total number of pararmeters:
wtf?
At the last gate, we compute the derivatives with respect to each component of
Now, in order to apply the chain rule, we would need to know the derivatives of
Let's kill a fucking gate.
ROC is computed by TPR and FPR, which are both divided respectively by
The AUC is computed by computing: